Sorting On A Graphics Processing Unit(GPU)
نویسنده
چکیده
2.1 Graphics Processing Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.2 Sorting Numbers on GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.2.1 SDK Radix Sort Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.2.1.1 Step 1–Sorting tiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.2.1.2 Step 2–Calculating histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.2.1.3 Step 3–Prefix sum of histogram . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.2.1.4 Step 4–Rearrangement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.2.2 GPU Radix Sort(GRS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.2.2.1 Step 1–Histogram and Ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 2.2.2.2 Step 2–Prefix sum of tile histograms . . . . . . . . . . . . . . . . . . . . 59 2.2.2.3 Step 3–Positioning numbers in a tile . . . . . . . . . . . . . . . . . . . . 59 2.2.3 SRTS Radix Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 2.2.3.1 Step 1–Bottom level reduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 2.2.3.2 Step 2–Top level scan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 2.2.3.3 Step 3–Bottom level scan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 2.2.4 GPU Sample Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.2.4.1 Step 1–Splitter selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.2.4.2 Step 2–Finding buckets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 2.2.4.3 Step 3–Prefix sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 2.2.4.4 Step 4–Placing elements into buckets . . . . . . . . . . . . . . . . . . . 64 2.2.5 Warpsort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 2.2.5.1 Step 1–Bitonic sort by warps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 2.2.5.2 Step 2–Bitonic merge by warps . . . . . . . . . . . . . . . . . . . . . . . . . . 66 2.2.5.3 Step 3–Splitting long sequences . . . . . . . . . . . . . . . . . . . . . . . . . 67 2.2.5.4 Step 4–Final merge by warps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 2.2.6 Comparison of number sorting algorithms . . . . . . . . . . . . . . . . . . . . . . . . 68 2.3 Sorting Records on GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 2.3.1 Record Layouts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 2.3.2 High level Strategies for sorting records . . . . . . . . . . . . . . . . . . . . . . . . . . 70 2.3.3 Sample Sort For Sorting Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 2.3.4 SRTS For Sorting Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 2.3.5 GRS For Sorting Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 2.3.6 Comparison of record sorting algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 73 2.3.7 Run Times for ByField layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 2.3.8 Run Times for Hybrid layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
منابع مشابه
Ultra-Fast Image Reconstruction of Tomosynthesis Mammography Using GPU
Digital Breast Tomosynthesis (DBT) is a technology that creates three dimensional (3D) images of breast tissue. Tomosynthesis mammography detects lesions that are not detectable with other imaging systems. If image reconstruction time is in the order of seconds, we can use Tomosynthesis systems to perform Tomosynthesis-guided Interventional procedures. This research has been designed to study u...
متن کاملParallel Implementation of Particle Swarm Optimization Variants Using Graphics Processing Unit Platform
There are different variants of Particle Swarm Optimization (PSO) algorithm such as Adaptive Particle Swarm Optimization (APSO) and Particle Swarm Optimization with an Aging Leader and Challengers (ALC-PSO). These algorithms improve the performance of PSO in terms of finding the best solution and accelerating the convergence speed. However, these algorithms are computationally intensive. The go...
متن کاملPerformance Analysis of Parallel Sorting Algorithms using GPU Computing
Sorting is a well interrogating issue in computer science. Many authors have invented numerous sorting algorithms on CPU (Central Processing Unit). In today's life sorting on the CPU is not so efficient. To get the efficient sorting parallelization should be done. There are many ways of parallelization of sorting but at the present time GPU (Graphics Processing Unit) computing is the most ...
متن کاملImplementation of the direction of arrival estimation algorithms by means of GPU-parallel processing in the Kuda environment (Research Article)
Direction-of-arrival (DOA) estimation of audio signals is critical in different areas, including electronic war, sonar, etc. The beamforming methods like Minimum Variance Distortionless Response (MVDR), Delay-and-Sum (DAS), and subspace-based Multiple Signal Classification (MUSIC) are the most known DOA estimation techniques. The mentioned methods have high computational complexity. Hence using...
متن کاملGPUMemSort : A High Performance Graphics Co - processors Sorting Algorithm for Large Scale In - Memory Data
In this paper, we present a GPU-based sorting algorithm, GPUMemSort, which achieves high performance in sorting large-scale in-memory data by take advantage of GPU processors. It consists of two algorithms: an in-core algorithm, which is responsible for sorting data in GPU global memory efficiently, and an out-of-core algorithm, which is responsible for dividing large-scale data into multiple c...
متن کاملThe Comparison of Parallel Sorting Algorithms Implemented on Different Hardware Platforms
Sorting is a common problem in computer science. There are a lot of wellknown sorting algorithms created for sequential execution on a single processor. Recently, many-core and multi-core platforms have enabled the creation of wide parallel algorithms. We have standard processors that consist of multiple cores and hardware accelerators, like the GPU. Graphic cards, with their parallel architect...
متن کامل